# Knowledge Distillation
Openr1 Distill 7B
Apache-2.0
OpenR1-Distill-7B is a post-trained version of Qwen2.5-Math-7B on the Mixture-of-Thoughts dataset, designed to teach language models step-by-step reasoning.
Large Language Model
Transformers English

O
open-r1
134
6
Unime LLaVA 1.6 7B
MIT
UniME is a general embedding learning model based on a multimodal large model, trained with 336×336 image resolution and ranked first on the MMEB leaderboard.
Image-to-Text
Transformers English

U
DeepGlint-AI
188
3
Unime Phi3.5 V 4.2B
MIT
UniME is a general embedding learning model based on a multimodal large model, focusing on breaking down modal barriers to achieve cross-modal retrieval and embedding learning.
Multimodal Alignment
Transformers English

U
DeepGlint-AI
54
4
Splade Disco Human Mistral
A conversational search model improved based on SPLADE++, optimized for multi-turn dialogue query semantic understanding through multi-teacher distillation strategy
Text Embedding English
S
slupart
27
3
Splade Disco Human
A conversational search version adapted from the SPLADE++ model, fine-tuning the query encoder on the QReCC dataset to optimize multi-turn conversational search performance.
Text Embedding English
S
slupart
22
2
Minimaid L2
Apache-2.0
MiniMaid-L2 is a role-play specialized model further optimized from MiniMaid-L1, achieving outstanding performance among 3B-scale models through knowledge distillation and training on a larger dataset.
Large Language Model
Transformers English

M
N-Bot-Int
63
2
Distill Any Depth Small Hf
MIT
Distill-Any-Depth is a SOTA monocular depth estimation model trained based on knowledge distillation algorithms, capable of efficient and accurate depth estimation.
3D Vision
Transformers

D
xingyang1
1,214
3
ARWKV R1 1B5
Apache-2.0
ARWKV-R1-1B5 is an early preview version of a 7-billion-parameter model based on RNN, trained through three-stage knowledge distillation from DeepSeek-R1-Distill-Qwen-1.5B, with a context length of 2k.
Large Language Model
Transformers Supports Multiple Languages

A
RWKV-Red-Team
164
4
Deepseer R1 Vision Distill Qwen 1.5B Google Vit Base Patch16 224
Apache-2.0
DeepSeer is a vision-language model developed based on the DeepSeek-R1 model, supporting chain-of-thought reasoning and trained through dialogue templates for visual models.
Image-to-Text
Transformers

D
mehmetkeremturkcan
25
2
Qwen2.5 14B DeepSeek R1 1M Uncensored
This is a 14B-parameter large language model based on Qwen2.5-14B-DeepSeek-R1-1M, fused with DeepSeek-R1-Distill-Qwen-14B-abliterated-v2 using the TIES method
Large Language Model
Transformers

Q
FiditeNemini
154
6
Deepseek R1 Distill Qwen 32B Japanese
MIT
Japanese large language model released by CyberAgent, distilled and optimized based on Qwen-32B
Large Language Model Japanese
D
cyberagent
1,190
250
Gguf Jina Reranker V1 Tiny En
Apache-2.0
A model specifically designed for ultra-fast reranking, based on the JinaBERT architecture, supporting long text sequence processing (up to 8,192 tokens).
Text Embedding English
G
Felladrin
3,831
1
Deepseek R1 BF16
MIT
DeepSeek-R1 is an 8B parameter model based on the Llama architecture, developed by the DeepSeek team, focusing on efficient inference and fine-tuning.
Large Language Model
Transformers English

D
unsloth
944
22
Koala Lightning 700m
KOALA-Lightning-700M is an efficient text-to-image model trained through knowledge distillation based on SDXL-Lightning, significantly improving inference speed while maintaining generation quality
Image Generation
K
etri-vilab
170
6
Phi 2 Sft Ultrachat Full
MIT
A large language model fine-tuned on the ultrachat_200k dataset based on microsoft/phi-2, suitable for dialogue generation tasks.
Large Language Model
Transformers Other

P
lole25
68
2
Distil Medium.en
MIT
Distil-Whisper is a distilled version of the Whisper model, 6 times faster than the original, with a 49% reduction in size, while maintaining performance close to the original in English speech recognition tasks.
Speech Recognition English
D
distil-whisper
186.85k
120
Distil Large V2
MIT
Distil-Whisper is a distilled version of the Whisper model, achieving 6x speedup and 49% size reduction with only a 1% WER difference on out-of-distribution evaluation sets.
Speech Recognition English
D
distil-whisper
42.65k
508
Rbt4 H312
Apache-2.0
MiniRBT is a Chinese small pre-trained model developed based on knowledge distillation technology, optimized for training efficiency using Whole Word Masking.
Large Language Model
Transformers Chinese

R
hfl
34
5
Minirbt H288
Apache-2.0
MiniRBT is a Chinese small pretrained model developed based on knowledge distillation technology, optimized for training efficiency using Whole Word Masking.
Large Language Model
Transformers Chinese

M
hfl
405
8
Minirbt H256
Apache-2.0
MiniRBT is a small Chinese pre-trained model based on knowledge distillation technology, combined with whole word masking, suitable for various Chinese natural language processing tasks.
Large Language Model
Transformers Chinese

M
hfl
225
7
Clip Vit Large Patch14 Ko
MIT
Korean CLIP model trained via knowledge distillation, supporting Korean and English multimodal understanding
Text-to-Image
Transformers Korean

C
Bingsu
4,537
15
Re2g Qry Encoder Fever
Apache-2.0
Re2G is a generative model combining neural initial retrieval and reranking for knowledge-intensive tasks. This question encoder is a component of the Re2G system, used to encode questions into vectors for retrieval.
Text Embedding
Transformers

R
ibm-research
17
0
Re2g Qry Encoder Nq
Apache-2.0
Re2G is an end-to-end system combining neural retrieval, reranking, and generation for knowledge-intensive tasks. This model serves as its Natural Questions (NQ) question encoder component.
Question Answering System
Transformers

R
ibm-research
14
0
Distilbert Base Uncased Finetuned Squad
Apache-2.0
A model fine-tuned on Q&A datasets based on Distilled BERT Base, suitable for Q&A tasks
Question Answering System
Transformers

D
jhoonk
15
0
Bert Large Uncased Squadv1.1 Sparse 80 1x4 Block Pruneofa
Apache-2.0
This model is obtained by fine-tuning a pre-trained 80% 1x4 block sparse Prune OFA BERT-Large model through knowledge distillation, demonstrating excellent performance on the SQuADv1.1 Q&A task.
Question Answering System
Transformers English

B
Intel
15
1
Minilm L6 H384 Uncased
MIT
This is a 6-layer lightweight version of microsoft/MiniLM-L12-H384-uncased, achieved by retaining every other layer for reduced size
Large Language Model
M
nreimers
9,300
36
Tinybert L 4 H 312 V2
TinyBERT is a lightweight BERT model developed by Huawei Noah's Ark Lab, which compresses the model size through knowledge distillation while maintaining high performance.
Large Language Model
T
nreimers
5,166
1
Minilmv2 L6 H384 Distilled From BERT Base
MiniLMv2 is a lightweight pre-trained language model introduced by Microsoft, achieving efficient inference through knowledge distillation technology.
Large Language Model
Transformers

M
nreimers
179
0
Dkrr Dpr Nq Retriever
FiD is a Q&A system model based on knowledge distillation, improving the efficiency of Q&A systems by distilling the knowledge from the reader model into the retriever.
Question Answering System
Transformers

D
castorini
38
0
Tct Colbert V2 Hnp Msmarco
TCT-ColBERT-V2 is a dense retrieval model based on the tightly-coupled teacher mechanism and in-batch negative sample knowledge distillation, designed for efficient text retrieval.
Text Embedding
Transformers

T
castorini
1,382
4
Tct Colbert V2 Msmarco
TCT-ColBERT-V2 is a dense retrieval model based on knowledge distillation, which improves retrieval efficiency and quality through tightly coupled teacher mechanisms and batch negative optimization training.
Text Embedding
Transformers

T
castorini
2,220
0
Distill Bert Base Spanish Wwm Cased Finetuned Spa Squad2 Es
Apache-2.0
A Spanish Q&A model optimized via distillation from BETO, more lightweight and efficient than the standard version
Question Answering System Spanish
D
mrm8488
2,145
48
Tinybert Spanish Uncased Finetuned Ner
A named entity recognition model fine-tuned based on Spanish TinyBERT, with a size of only 55MB, suitable for entity recognition tasks in Spanish text.
Sequence Labeling Spanish
T
mrm8488
64
3
Distilbert Base Cased Distilled Squad
Apache-2.0
DistilBERT is a lightweight distilled version of BERT, with 40% fewer parameters, 60% faster speed, while retaining over 95% performance. This model is a question-answering specialized version fine-tuned on the SQuAD v1.1 dataset.
Question Answering System English
D
distilbert
220.76k
244
Distilbert Base Uncased Distilled Squad
Apache-2.0
DistilBERT is a lightweight distilled version of BERT, with 40% fewer parameters and 60% faster speed, maintaining over 95% of BERT's performance on the GLUE benchmark. This model is fine-tuned specifically for question answering tasks.
Question Answering System
Transformers English

D
distilbert
154.39k
115
Featured Recommended AI Models